Visualize Statistical Summary Reports from Partnering Healthcare Providers

Fan Wang

Nov 2 2020

In the fight against COVID-19, there are many obstacles to analyzing and exploring health data from multiple healthcare providers. To this end, we have been working with various regional healthcare provider organizations to identify shared common data elements, which can be used to harmonize and manage data sharing within our Chicagoland Pandemic Response Commons. We established a Statistical Summary Reports (SSR) node in our data model to store a range of de-identified statistical reports across multiple partnering healthcare providers. Our goal is to accelerate rapid exchanges of health data among healthcare facilities, researchers and response authorities during public health emergencies. For more information, please visit the Chicagoland Pandemic Response Commons.

This notebook provides an example on how to visualize the status of COVID-19 and general patients information at multiple regional healthcare providers using the demo data from 03/10/2020 to 11/10/2020. The Chicagoland Pandemic Response Commons disclaims responsibility concerning the data’s accuracy, reliability, completeness, timeliness, or usefulness.

How to interact with the figures

  • By moving the time slider or clicking the "Play" and "Pause" buttons, it is possible to see how running total confirmed cases have changed over time.
  • Hovering over the lines or dots shows the number of patients for the corresponding date and healthcare provider.

Setup notebook

Uncomment the lines to install libraries you need. Import required modules:

In [2]:
# !pip install plotly==4.7.1
# !pip install gen3
import requests, json, fnmatch, os, os.path, sys, subprocess, glob, ntpath, copy
import pandas as pd
import numpy as np
from pandas.io.json import json_normalize
from collections import Counter
import gen3
from gen3.auth import Gen3Auth
from gen3.submission import Gen3Submission
from gen3.file import Gen3File
import plotly.graph_objects as go
import plotly.express as px
import warnings
warnings.simplefilter('ignore')

Obtain data using Gen3 SDK

To extract the data we needed, we simply export the Statistical Summary Reports (SSR) node from the Chicagoland Pandemic Response Commons. For users granted data access, the API key is provided on the Profile page after clicking the “Create API key” button.

In [3]:
api = "https://chicagoland.pandemicresponsecommons.org"
creds = "credentials.json"
auth = Gen3Auth(api, refresh_file=creds)
sub = Gen3Submission(api, auth)
file = Gen3File(api, auth)

Export the metadata that is stored under a specific node and project using the SDK function export_node

In [4]:
statistical_summary = sub.export_node(
    "controlled",
    "SSR",
    "statistical_summary_report",
    "tsv",
    "./statistical_summary_report.tsv",
)
statistical_summary = pd.read_csv("./statistical_summary_report.tsv", sep="\t")
Output written to file: ./statistical_summary_report.tsv

Clean and extract the data needed for visualization

Please note that small cell counts (less than 5 data points) for any property are not submitted to Gen3 and instead will be entered as null values to protect patient privacy.

In [5]:
statistical_summary["date"] = (
    statistical_summary["submitter_id"].str.split("_").str.get(-1)
)
statistical_summary = statistical_summary.sort_values(
    by="date", ascending=True
).reset_index()
statistical_summary = statistical_summary[
    [
        "date",
        "num_COVID",
        "num_COVID_deaths",
        "num_admitted",
        "num_outpatient",
        "num_asth",
        "num_card",
        "num_chf",
        "num_diab",
        "num_icu",
        "num_obes",
        "num_pneu",
        "num_resp",
        "num_vent",
    ]
]
statistical_summary = statistical_summary.rename(columns={"date": "Date"})

Total number of confirmed COVID-19 patients and deaths from partnering healthcare providers

Confirmed cases from the demo data from 03/10/2020 to 11/10/2020. The number of COVID deaths is available from 05/28/2020 to 11/10/2020.

In [6]:
# make figure
fig_dict = {"data": [], "layout": {}, "frames": []}
data_dict = {
    "x": statistical_summary.Date,
    "y": statistical_summary.num_COVID,
    "mode": "markers",
    "name": "Confirmed",
}
# fill in layout
fig_dict["data"] = go.Scatter(data_dict)
figure = go.Figure(fig_dict)
figure.add_trace(
    go.Scatter(
        x=statistical_summary.Date,
        y=statistical_summary.num_COVID_deaths,
        mode="markers",
        name="Death",
    )
)
figure.update_layout(
    hovermode="closest",
    title="Number of Confirmed COVID-19 Cases and Deaths",
    xaxis_title="Date",
    yaxis_title="Number of Confirmed/Deaths",
    autosize=True,
    width=900,
    height=650,
    legend_title_text="",
    legend=dict(orientation="v", yanchor="top", y=1.02, xanchor="right", x=1),
    updatemenus=[
        dict(
            type="buttons",
            direction="left",
            buttons=list(
                [
                    dict(
                        args=[{"yaxis.type": "linear"}],
                        label="Linear",
                        method="relayout",
                    ),
                    dict(args=[{"yaxis.type": "log"}], label="Log", method="relayout"),
                ]
            ),
        ),
    ],
)
figure.show("notebook")

Visualize the trend of confirmed cases over time during reporting period

(Data as of 03/10/2020 to 11/10/2020)

In [7]:
fig = px.scatter(
    statistical_summary,
    x="Date",
    y="num_COVID",
    animation_frame="Date",
    range_y=[0, 250],
    range_x=["2020-03-10", "2020-11-10"],
)
fig.update_traces(
    marker=dict(size=12, line=dict(width=1, color="Black")),
    selector=dict(mode="markers"),
)
fig.update_layout(
    hovermode="x unified",
    title="Number of Confirmed COVID-19 Cases",
    xaxis_title="Date",
    yaxis_title="Number of Confirmed Cases",
    autosize=True,
    width=900,
    height=650,
    legend_title_text="",
    legend=dict(orientation="v", yanchor="top", y=1.02, xanchor="right", x=1),
)
fig.update_layout(sliders=[dict(transition=dict(duration=5), len=0.91)])
fig.show("notebook")

The number of inpatients varies by preexisting health conditions

(Data as of 05/28/2020 to 10/15/2020 as an example. Users can adjust the range of dates by changing range_x.)

In [8]:
df1 =  statistical_summary[
    [
        "Date",
        "num_asth",
        "num_card",
        "num_chf",
        "num_diab",
        "num_obes",
        "num_pneu",
        "num_resp",
    ]
]
df1 = df1.rename(
    columns={
        "num_asth": "Asthma",
        "num_card": "Cardiovascular disease",
        "num_chf": "Congestive heart failure",
        "num_diab": "Diabetes",
        "num_obes": "Obesity",
        "num_pneu": "Pneumonia",
        "num_resp": "Respiratory conditions",
    }
)
df2 = df1.melt(
    id_vars=["Date"], var_name="Health condition", value_name="Number of Patients"
)
fig = px.line(df2, x="Date", y="Number of Patients", color="Health condition", range_x=["2020-05-28", "2020-10-15"])
fig.update_layout(
    title="Number of Inpatients with Various Preexisting Health Conditions",
    hoverlabel=dict(font_size=10),
    hovermode="x unified",
    updatemenus=[
        dict(
            type="buttons",
            direction="left",
            buttons=list(
                [
                    dict(
                        args=[{"yaxis.type": "linear"}],
                        label="Linear",
                        method="relayout",
                    ),
                    dict(args=[{"yaxis.type": "log"}], label="Log", method="relayout"),
                ]
            ),
        ),
    ],
)
fig.show("notebook")

Data Disclaimer

The statistical summary reports are collected from various sources without any kind of normalization and may reflect inconsistent submissions. Thus this demo notebook demonstrates how to access and visualize the data for users who may not be familiar with Jupyter Notebooks and interested in trends shown by the SSRs. However, the data themselves should not be regarded as containing useful information. The data is not intended to constitute advice nor is it to be used as a substitute for decision making from a professional. Users should not act based upon the information here without independently verifying and obtaining any necessary professional advice.

Data Currency

This data may be updated at irregular intervals. Users should check for updates regularly and ensure the most current version of the data is being used.